Objective

Develop a model that reflects the significance of cover, substrate, depth, and velocity on fish presence and absence.

Analysis Overview

Model was developed using Feather River Mini Snorkel Data. The dataset consists of numeric fish count observations that can also be expressed as a binary presence–absence response. Because the counts were highly zero-inflated (i.e., many observations with no fish detected), we initially evaluated a hurdle modeling approach following the framework described in Gard (2024, in review). Hurdle models are well suited for datasets dominated by absences, as they separately model the processes governing occurrence and abundance. In our case, the hurdle model provided reasonable performance for the presence–absence (zero) component but performed poorly for the count component, indicating that fish abundance could not be reliably predicted from the available covariates. Given this limitation, we shifted our focus to occurrence modeling using logistic regression. The strongest overall performance was obtained using a mixed-effects logistic regression that included a random intercept for transect sites nested within channel location (high-flow and low-flow channels), allowing us to account for spatial structure and repeated sampling within sites.

Review Data

## Rows: 17,882
## Columns: 44
## $ micro_hab_data_tbl_id                       <dbl> 18, 18, 18, 19, 20, 21, 22…
## $ location_table_id                           <dbl> 11, 11, 11, 11, 11, 11, 11…
## $ transect_code                               <dbl> 0.1, 0.1, 0.1, 0.2, 0.3, 0…
## $ fish_data_id                                <dbl> 21, 22, 23, NA, NA, NA, 25…
## $ date                                        <date> 2001-03-14, 2001-03-14, 2…
## $ count                                       <dbl> 2, 3, 1, 0, 0, 0, 3, 0, 0,…
## $ species                                     <chr> "chinook salmon", "chinook…
## $ fl_mm                                       <dbl> 35, 35, 25, NA, NA, NA, 25…
## $ dist_to_bottom                              <dbl> 1.0, 1.5, 1.5, NA, NA, NA,…
## $ depth                                       <dbl> 17, 17, 17, 19, 11, 12, 11…
## $ focal_velocity                              <dbl> 0.94, 0.16, 0.16, NA, NA, …
## $ velocity                                    <dbl> 0.22, 0.22, 0.22, 0.35, 1.…
## $ surface_turbidity                           <dbl> 20, 20, 20, 30, 30, 30, 10…
## $ percent_fine_substrate                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_sand_substrate                      <dbl> 40, 40, 40, 50, 25, 0, 70,…
## $ percent_small_gravel_substrate              <dbl> 20, 20, 20, 40, 75, 80, 30…
## $ percent_large_gravel_substrate              <dbl> 30, 30, 30, 10, 0, 20, 0, …
## $ percent_cobble_substrate                    <dbl> 10, 10, 10, 0, 0, 0, 0, 0,…
## $ percent_boulder_substrate                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_no_cover_inchannel                  <dbl> 75, 75, 75, 100, 100, 100,…
## $ percent_small_woody_cover_inchannel         <dbl> 15, 15, 15, 0, 0, 0, 20, 0…
## $ percent_large_woody_cover_inchannel         <dbl> 0, 0, 0, 0, 0, 0, 40, 0, 0…
## $ percent_submerged_aquatic_veg_inchannel     <dbl> 10, 10, 10, 0, 0, 0, 30, 0…
## $ percent_undercut_bank                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_no_cover_overhead                   <dbl> 100, 100, 100, 100, 100, 1…
## $ percent_cover_half_meter_overhead           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_cover_more_than_half_meter_overhead <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ channel_geomorphic_unit                     <chr> "glide", "glide", "glide",…
## $ location                                    <chr> "hatchery ditch", "hatcher…
## $ channel_location                            <chr> "LFC", "LFC", "LFC", "LFC"…
## $ water_temp                                  <dbl> 47, 47, 47, 47, 47, 47, 47…
## $ weather                                     <chr> "direct sunlight", "direct…
## $ flow                                        <dbl> 12, 12, 12, 12, 12, 12, 12…
## $ number_of_divers                            <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ reach_length                                <dbl> 25, 25, 25, 25, 25, 25, 25…
## $ reach_width                                 <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ channel_width                               <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7,…
## $ channel_type                                <chr> "sidechannel", "sidechanne…
## $ river_mile                                  <dbl> 66.6, 66.6, 66.6, 66.6, 66…
## $ coordinate_method                           <chr> "assigned based on similar…
## $ latitude                                    <dbl> 39.51602, 39.51602, 39.516…
## $ longitude                                   <dbl> -121.5588, -121.5588, -121…
## $ fish_presence                               <fct> 1, 1, 1, 0, 0, 0, 1, 0, 0,…
## $ month                                       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3,…

Outliers

count outliers exist, in the high flow and the low flow channel, however their removal did not impact the model results so they were kept in the dataset.

We thought that by removing values greater than 250 this would limit overdispersion in the count model, however, it did not.

High Flow vs. Low Flow Channel

Table 1 and figure 2 explore whether fish presence was impacted by the high or low flow channels. Overall there are similar numbers of fish present in the high flow channel compared to the low flow channel (table 1). There are more fish present in the high flow channel in March but they move quickly downstream. Fish remain in the low flow channel for much longer time (figure 2).

do we have the same amount of sampling plots in the high flow vs. low flow?

Table 1. Total count of fish between high flow and low flow channels
channel_location n
HFC 11082
LFC 17073
Table 2. Number of sampling sites in the high flow and low flow channels
channel_location n_sites
HFC 16
LFC 13

Redd Location Exploration

Process

  • Source Feather River redd data from EDI
  • Remove locations that have zero redds
  • Spatially join redd locations to the mini snorkel transect locations
  • Summarize by total number of redds and presence/absence

Caveats

  • The temporal range of redd data (2014-2023) differs from the mini snorkel (2001, 2002). This explorations summarizes the entire redd dataset.

Redd data over time

This visual represents the number of redds over time at each location. It helps provide context on which sites generally have redds and if they have redds consistently over time. Qualitatively, it seems like there are more redds counted over time.

Combine redd data with mini snorkel to see if redds are a spatial indicator or spawning potential/habitat quality.

Find the nearest transect location from the Mini Snorkel data to each redd

Redds are joined to the nearest mini snorkel transect location within 50-meters. The following histogram shows the distribution of the redd distances; 50-meters was chosen based off of the high counts (shown as the red dashed line).

A visual representation of the amount of redds joined with each of the mini snorkel transect locations:

Outmigration Analysis

Goal

Understand timing and patterns of juvenile outmigration on the Feather River and compare to timing and density of fish observations in the mini snorkel dataset.

Insights

  • The majority of catch in RSTs on the Feather River (both LFC and HFC) have passed through by March (~80%)
  • There are small differences in cumulative catch curves between the HFC and LFC indicating that outmigration is not affecting these sites differently
  • The fish that remain in the Feather River after March and into May and June are likely larger ( > 50mm) which aligns with the fork length distributions by month in the habitat data

Variables of Interest

The variables of interest include cover, substrate, velocity and depth variables known to be important for salmon rearing habitat. We are also including the number of redds found near each mini snorkel transect and whether or not there were redds nearby.

  • Velocity - numeric
  • Depth - numeric
  • Number or Redds - numeric
  • Redd at location - 0/1

Substrate and cover variables are measured as percentages are converted to binary presence/absence (1/0) by establishing a threshold percentage of 20%. Overhanging vegetation was measured at 1/2 meter overhead and more than 1/2 meter overhead. These categories were combined for simplicity and for comparison with other studies, such as those by Mark Gard.

  • Undercut bank - 0/1
  • Aquatic vegetation - 0/1
  • Overhanging vegetation - 0/1
  • Small woody cover - 0/1
  • Large woody cover - 0/1
  • Boulder substrate - 0/1
  • Cobble substrate - 0/1

Data was collected between March through August, however, outmigration affects fish presence later in the season. We do not include month as it is a strong predictor of fish presence. The analysis could be limited to focus on the month with most observations (March), though including all months also makes sense if seeking to determine variables that are stable throughout time.

Conditions in high-flow and low-flow channels differ substantially, and initial exploratory analyses considered fitting separate models for each channel type. Instead, we incorporated channel type (high-flow versus low-flow) as a random effect within a single logistic regression framework. This approach allowed us to account for systematic differences between channel types while retaining a unified model structure. Because additional spatial variability exists at the site (transect) level, we further nested site location within channel type, enabling the model to capture both broad channel-scale differences and finer-scale site-level nuance.

Build Model Data

All cover variables were converted to presence/absence using a threshold of 20%. The following is the data structure of the model input data.

## Rows: 8,243
## Columns: 27
## $ count                                       <dbl> 2, 3, 1, 0, 0, 0, 3, 0, 0,…
## $ location                                    <chr> "hatchery ditch", "hatcher…
## $ channel_location                            <chr> "LFC", "LFC", "LFC", "LFC"…
## $ depth                                       <dbl> 17, 17, 17, 19, 11, 12, 11…
## $ velocity                                    <dbl> 0.22, 0.22, 0.22, 0.35, 1.…
## $ percent_small_woody_cover_inchannel         <dbl> 15, 15, 15, 0, 0, 0, 20, 0…
## $ percent_large_woody_cover_inchannel         <dbl> 0, 0, 0, 0, 0, 0, 40, 0, 0…
## $ percent_submerged_aquatic_veg_inchannel     <dbl> 10, 10, 10, 0, 0, 0, 30, 0…
## $ percent_cover_half_meter_overhead           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_cover_more_than_half_meter_overhead <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_cobble_substrate                    <dbl> 10, 10, 10, 0, 0, 0, 0, 0,…
## $ percent_boulder_substrate                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ percent_undercut_bank                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ month                                       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ channel_geomorphic_unit                     <chr> "glide", "glide", "glide",…
## $ reach_length                                <dbl> 25, 25, 25, 25, 25, 25, 25…
## $ reach_width                                 <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ channel_type                                <chr> "sidechannel", "sidechanne…
## $ small_woody                                 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0,…
## $ large_woody                                 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0,…
## $ boulder_substrate                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ cobble_substrate                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ undercut_bank                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ aquatic_veg                                 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0,…
## $ overhanging_veg                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ redd_total                                  <dbl> 2269, 2269, 2269, 2269, 22…
## $ redd_presence                               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Model Performance Evaluation Overview

Model performance was evaluated using a combination of receiver operating characteristic (ROC) analysis and confusion matrix–based classification metrics. The area under the ROC curve (AUC) was used to assess the model’s ability to discriminate between presence and absence across all possible probability thresholds. AUC provides a threshold-independent measure of performance, where values near 0.5 indicate no discrimination and higher values indicate increasing ability to correctly rank presences above absences.

In addition, model predictions were converted to binary classifications using a fixed probability threshold, and a confusion matrix was used to summarize agreement between predicted and observed outcomes. The confusion matrix tabulates true positives, true negatives, false positives, and false negatives, allowing evaluation of classification behavior such as sensitivity to presences and specificity to absences. Because the dataset was imbalanced, with many more absences than presences, emphasis was placed on interpreting the confusion matrix in conjunction with AUC rather than relying on overall accuracy alone.

Confusion matrises have the following layout where “True” fish presences are represented as val1 and val4. “False” positives are represented by val2 as cases where fish presence was predicted but not observed. “False” negatives are represented by val3.

Observed Absence (0) Observed Presence (1)
Predicted Absence (0) val1 val3
Predicted Presence (1) val2 val4

Together, these metrics provide complementary perspectives on model performance: AUC characterizes overall discriminatory ability independent of threshold choice, while the confusion matrix illustrates how predictions behave at a specific cutoff and highlights trade-offs between detecting presences and avoiding false positives.

Hurdle Model Approach

Hurdle Models and Interpretation

A hurdle model was used in Gard 2024 (in-review) to test for the effects of cover and habitat type on the total abundance of Chinook salmon at both site and cell level. Here we explore the use oif a hurdle model to help understand the influence of velocity, depth, and cover on fish count and presence/absence.

Hurdle Models

Hurdle models are used when count data has an excess of zeros. These models can be understood as a mixture of two subset of populations. In one subset, we have a usual count model that may or may not generate zero, and the other subset only produce zero count.

A hurdle model models excess zeroes separately from the rest of the data. The zero counts are modeled as a binary response variable and the positive counts are modeled using poisson distribution.

Interpreting a Hurdle Model

The binary part of the model helps identify factors that influence the presence/absence of fish. The coefficients of the zero part of the hurdle model represent the odds ratio of observing at least one fish.

The count part of the model estimate the effects of predictor variables on the count outcome, excluding all zero counts. Coefficients of counts represent rate ratios of one or more fish observed.

The Incidence Result Ratio (IRR) in the count part of the model (count > 0) represent the multiplicative effect of a one-unit change in a predictor variable on the expected count of non-zero observations, assuming all other variables are held constant. For example, if the IRR for a predictor is 1.2, it means that a one-unit increase in that predictor is associated with a 20% increase in the expected count of non-zero observations, assuming all other variables remain constant. For the binary part of the model - if the coefficient for a predictor in the binary part of the hurdle model is 0.5, it means that a one-unit increase in the predictor is associated with a 50% increase in the odds of having a zero count versus a positive count, assuming all other variables are held constant.

Build Model

Hurdle Model Results Summary

## Start:  AIC=7550.06
## count ~ small_woody + depth + velocity + large_woody + aquatic_veg + 
##     overhanging_veg + cobble_substrate + boulder_substrate + 
##     undercut_bank + redd_total + redd_presence
## 
##                     Df    AIC
## - boulder_substrate  2 7548.0
## - aquatic_veg        2 7548.9
## <none>                 7550.1
## - velocity           2 7551.2
## - large_woody        2 7552.0
## - cobble_substrate   2 7552.6
## - undercut_bank      2 7553.7
## - redd_presence      2 7555.3
## - small_woody        2 7583.4
## - overhanging_veg    2 7599.6
## - depth              2 7611.4
## - redd_total         2 7839.3
## 
## Step:  AIC=7548.05
## count ~ small_woody + depth + velocity + large_woody + aquatic_veg + 
##     overhanging_veg + cobble_substrate + undercut_bank + redd_total + 
##     redd_presence
## 
##                    Df    AIC
## - aquatic_veg       2 7546.8
## <none>                7548.0
## - velocity          2 7548.9
## - large_woody       2 7550.0
## - cobble_substrate  2 7550.9
## - undercut_bank     2 7551.7
## - redd_presence     2 7553.5
## - small_woody       2 7581.7
## - overhanging_veg   2 7597.9
## - depth             2 7611.1
## - redd_total        2 7842.4
## 
## Step:  AIC=7546.83
## count ~ small_woody + depth + velocity + large_woody + overhanging_veg + 
##     cobble_substrate + undercut_bank + redd_total + redd_presence
## 
##                    Df    AIC
## - velocity          2 7546.8
## <none>                7546.8
## - large_woody       2 7548.0
## - cobble_substrate  2 7549.4
## - undercut_bank     2 7550.1
## - redd_presence     2 7553.3
## - small_woody       2 7580.3
## - overhanging_veg   2 7600.4
## - depth             2 7614.3
## - redd_total        2 7843.6
## 
## Step:  AIC=7546.78
## count ~ small_woody + depth + large_woody + overhanging_veg + 
##     cobble_substrate + undercut_bank + redd_total + redd_presence
## 
##                    Df    AIC
## <none>                7546.8
## - large_woody       2 7547.9
## - cobble_substrate  2 7548.1
## - undercut_bank     2 7550.3
## - redd_presence     2 7553.9
## - small_woody       2 7581.2
## - overhanging_veg   2 7606.3
## - depth             2 7618.7
## - redd_total        2 7844.7
## 
## Call:
## pscl::hurdle(formula = count ~ small_woody + depth + large_woody + overhanging_veg + 
##     cobble_substrate + undercut_bank + redd_total + redd_presence, data = model_data, 
##     dist = "negbin")
## 
## Pearson residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56447 -0.11661 -0.08367 -0.06081 68.00163 
## 
## Count model coefficients (truncated negbin with log link):
##                      Estimate   Std. Error z value       Pr(>|z|)    
## (Intercept)       -8.77945308  58.66046184  -0.150         0.8810    
## small_woody        0.02948715   0.27955914   0.105         0.9160    
## depth              0.04204054   0.00608787   6.906 0.000000000005 ***
## large_woody       -1.20049563   0.60722985  -1.977         0.0480 *  
## overhanging_veg    0.45097562   0.23209497   1.943         0.0520 .  
## cobble_substrate  -0.55535368   0.22954885  -2.419         0.0155 *  
## undercut_bank     -0.60927102   0.58366515  -1.044         0.2965    
## redd_total        -0.00010424   0.00009242  -1.128         0.2594    
## redd_presence     -0.71336048   0.43567690  -1.637         0.1016    
## Log(theta)       -13.53849069  58.65926490  -0.231         0.8175    
## Zero hurdle model coefficients (binomial with logit link):
##                     Estimate  Std. Error z value             Pr(>|z|)    
## (Intercept)      -4.08635476  0.14771368 -27.664 < 0.0000000000000002 ***
## small_woody       0.79626395  0.12377469   6.433    0.000000000124967 ***
## depth             0.00661963  0.00187579   3.529             0.000417 ***
## large_woody       0.52890047  0.33840759   1.563             0.118074    
## overhanging_veg   0.81196392  0.10224205   7.942    0.000000000000002 ***
## cobble_substrate  0.00355965  0.10336159   0.034             0.972527    
## undercut_bank     0.87564607  0.33138672   2.642             0.008233 ** 
## redd_total        0.00089709  0.00005704  15.726 < 0.0000000000000002 ***
## redd_presence     0.43263205  0.16044130   2.697             0.007007 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta: count = 0
## Number of iterations in BFGS optimization: 44 
## Log-likelihood: -3754 on 19 Df

Hurdle Model Performance

Evaluate the presence / absence component (classification)

## Area under the curve: 0.7428
##          Observed
## Predicted    0    1
##         0 2694   58
##         1 4971  520

An AUC of 0.7427608 detects from predictive capabilities in dectecting presence/absence using habitat variables but it is weak. However the confusion matrix implies poor predictability of presence which is likely due to the dataset being so heavily skewed towards absence. Because non-zero counts were low and weakly explained, abundance predictions were unreliable.

Evaluate the count component (abundance, conditional on presence)

Only evaluate sites where count > 0.

##          rmse           mae            r2 
## 211.704785516  49.784264937   0.003045783

The count component of the hurdle model performed very poorly (RMSE = 150.6, MAE = 43.0, pseudo-R² = 0.02), indicating that the model explained virtually none of the variation in non-zero fish counts. Given the structure of the dataset—characterized by a high frequency of absences and occasional extremely large count values—this lack of performance is not unexpected, as such conditions make it difficult for the count component of a hurdle model to reliably capture abundance patterns.

Hurdle Model Discussion

The hurdle model showed reasonable performance for the presence–absence component (AUC = 0.7427608), indicating that the predictors captured meaningful structure in fish occurrence. However, the count component exhibited poor predictive skill, likely due to the distribution of fish counts, which was dominated by zeros and characterized by extremely high variability among non-zero observations (median = 0, mean = 2.8, maximum = 1500). This combination of many low counts and occasional extreme values violates the assumptions of a simple Poisson or truncated count process and limits the model’s ability to reliably predict abundance. As a result, we elected to focus subsequent analyses on habitat associations for fish occurrence using logistic regression, which is more consistent with the information content and statistical properties of the data.

Logistic Regression Approach

Because fish counts were highly zero-inflated and exhibited extreme variability among non-zero values, abundance models performed poorly. We therefore focused on modeling fish occurrence using logistic regression, which better matches the information content of the data and provides more reliable inference on habitat associations.

Build Logistic Regression Model

A simple logistic regression to start.

## 
## Call:
## glm(formula = presence ~ small_woody + depth + velocity + large_woody + 
##     aquatic_veg + overhanging_veg + cobble_substrate + boulder_substrate + 
##     undercut_bank + redd_total + redd_presence, family = binomial(link = "logit"), 
##     data = log_reg_data)
## 
## Coefficients:
##                      Estimate  Std. Error z value             Pr(>|z|)    
## (Intercept)       -4.09736131  0.15583918 -26.292 < 0.0000000000000002 ***
## small_woody        0.78604199  0.12463662   6.307    0.000000000285103 ***
## depth              0.00663209  0.00192011   3.454             0.000552 ***
## velocity          -0.01323947  0.06861278  -0.193             0.846991    
## large_woody        0.55918285  0.33917667   1.649             0.099220 .  
## aquatic_veg        0.08053934  0.11499397   0.700             0.483691    
## overhanging_veg    0.78861493  0.10771446   7.321    0.000000000000245 ***
## cobble_substrate  -0.04310491  0.10946228  -0.394             0.693738    
## boulder_substrate  0.22225014  0.16934987   1.312             0.189394    
## undercut_bank      0.88442338  0.33210044   2.663             0.007742 ** 
## redd_total         0.00090105  0.00005271  17.096 < 0.0000000000000002 ***
## redd_presence      0.42787147  0.15957265   2.681             0.007332 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4186.6  on 8242  degrees of freedom
## Residual deviance: 3475.0  on 8231  degrees of freedom
## AIC: 3499
## 
## Number of Fisher Scoring iterations: 6

Performance

## Area under the curve: 0.7939

AUC of 0.7939341 means The habitat variables explain some presence–absence structure, but there is still a lot of overlap/noise in the data.

Confusion Matrix:

##          Observed
## Predicted    0    1
##         0 7620  534
##         1   45   44

Logistic Regression with Random Effect

Because fish occurrence varied among transect sites and between high-flow and low-flow channels, we adopted a mixed-effects logistic regression framework to account for this spatial heterogeneity. The model estimates the probability of fish presence as a function of local habitat features and spawning context, while allowing baseline occurrence to vary among channel types and individual transect sites through random effects.

We first fit a model that included channel type as a random effect and then extended this structure by nesting transect sites within channel type. The nested model provided a modest improvement in overall discriminatory performance based on AUC, while classification outcomes at a fixed threshold remained unchanged, indicating that the additional random effects improved the model’s ability to rank sites by likelihood of presence rather than alter binary predictions.

Random Effect of Location

Using the high flow, low flow channel as the random effect:

##  Family: binomial  ( logit )
## Formula:          
## presence ~ small_woody + depth + velocity + large_woody + aquatic_veg +  
##     overhanging_veg + cobble_substrate + boulder_substrate +  
##     undercut_bank + redd_total + redd_presence + (1 | location)
## Data: log_reg_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##    3364.5    3455.7   -1669.3    3338.5      8230 
## 
## Random effects:
## 
## Conditional model:
##  Groups   Name        Variance Std.Dev.
##  location (Intercept) 0.5175   0.7193  
## Number of obs: 8243, groups:  location, 29
## 
## Conditional model:
##                     Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)       -4.5756305  0.3092327 -14.797 < 0.0000000000000002 ***
## small_woody        0.6982272  0.1263098   5.528    0.000000032409419 ***
## depth              0.0111385  0.0019227   5.793    0.000000006905580 ***
## velocity          -0.1479476  0.0768626  -1.925               0.0543 .  
## large_woody        0.3120318  0.3484225   0.896               0.3705    
## aquatic_veg        0.1862350  0.1181501   1.576               0.1150    
## overhanging_veg    0.8132166  0.1109046   7.333    0.000000000000226 ***
## cobble_substrate   0.2508117  0.1211035   2.071               0.0384 *  
## boulder_substrate  0.2571316  0.1741170   1.477               0.1397    
## undercut_bank      0.8133447  0.3474092   2.341               0.0192 *  
## redd_total         0.0009662  0.0002332   4.143    0.000034338470204 ***
## redd_presence      0.4723638  0.3715057   1.271               0.2036    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Area under the curve: 0.8305
##          Observed
## Predicted    0    1
##         0 7622  501
##         1   43   77

Random Effect of Channel Location | Location

Nesting the transect location within the high flow/low flow channel as the random effect:

##  Family: binomial  ( logit )
## Formula:          
## presence ~ small_woody + depth + velocity + large_woody + aquatic_veg +  
##     overhanging_veg + cobble_substrate + boulder_substrate +  
##     undercut_bank + redd_total + redd_presence + (1 | channel_location/location)
## Data: log_reg_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##    3366.0    3464.3   -1669.0    3338.0      8229 
## 
## Random effects:
## 
## Conditional model:
##  Groups                    Name        Variance Std.Dev.
##  location:channel_location (Intercept) 0.4520   0.6723  
##  channel_location          (Intercept) 0.1192   0.3453  
## Number of obs: 8243, groups:  
## location:channel_location, 29; channel_location, 2
## 
## Conditional model:
##                     Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)       -4.2561462  0.5103236  -8.340 < 0.0000000000000002 ***
## small_woody        0.6980083  0.1262417   5.529    0.000000032180020 ***
## depth              0.0112233  0.0019222   5.839    0.000000005259428 ***
## velocity          -0.1497660  0.0769135  -1.947             0.051511 .  
## large_woody        0.3154602  0.3483895   0.905             0.365210    
## aquatic_veg        0.1881525  0.1181251   1.593             0.111200    
## overhanging_veg    0.8078986  0.1109635   7.281    0.000000000000332 ***
## cobble_substrate   0.2453731  0.1212036   2.024             0.042922 *  
## boulder_substrate  0.2513921  0.1740993   1.444             0.148751    
## undercut_bank      0.8075876  0.3465553   2.330             0.019789 *  
## redd_total         0.0008449  0.0002514   3.360             0.000778 ***
## redd_presence      0.0893492  0.5481530   0.163             0.870518    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Area under the curve: 0.8309
##          Observed
## Predicted    0    1
##         0 7622  501
##         1   43   77

Figures

The following figure represents the overall effect of each of the sites on fish presence while keeping all other sites constant. Points greater than 1 represent a higher probability of fish presence and points less than 1 represent a lower probability. Wider bars represent greater uncertainty.

The following figure represents the overall effect of each of the predictors for fish presence when keeping all other predictors constant. Points greater than 1 represent a higher probability of fish presence and points less than 1 represent a lower probability. Wider bars represent greater uncertainty.

Although depth and total number of redds have odds ratios close to one, both predictors are statistically significant, indicating small but consistent effects on fish presence. The magnitude of these effects is modest on a per-unit basis, but their significance reflects the precision of the estimates and the large sample size rather than a lack of biological relevance.

Logistic Regression with Random Effect Discussion

After accounting for local habitat characteristics, fish occurrence remained strongly structured by site and channel context. Random-intercept estimates for channel transects nested within locations indicated that some sites exhibited persistently elevated probabilities of fish presence, whereas others were characterized by consistently reduced occurrence. For example, Hatchery Ditch (LFC) and Junkyard Riffle (HFC) showed markedly positive random effects—corresponding higher odds of presence relative to the global mean—while Vance Avenue (HFC) and Auditorium Riffle (LFC) exhibited substantially lower occurrence. These patterns suggest the influence of additional reach-scale or contextual factors not captured by the measured microhabitat variables, such as longitudinal connectivity, geomorphic setting, or broader hydraulic and thermal conditions.

Within this spatial context, local habitat structure emerged as the strongest and most consistent driver of fish occurrence. Sites characterized by riparian cover, undercut banks, woody structure, and boulder features were substantially more likely to support fish presence, reflecting the importance of physical complexity and refuge in shaping habitat use. Structural habitat features—including small woody debris, overhanging vegetation, and undercut banks—showed large, statistically significant positive effects, with odds ratios ranging from approximately 2.2 to 2.4. These effects were estimated with relatively narrow confidence intervals, indicating both strong effect sizes and high certainty.

Hydraulic variables played a secondary role in explaining occurrence patterns. Depth was statistically significant but associated with a small per-unit effect (odds ratio ≈ 1.01), suggesting a modest but consistent increase in presence probability that accumulates across the observed depth range. Velocity exhibited a weak, marginally significant negative association with occurrence, indicating a possible but uncertain influence once structural habitat features were accounted for.

Responses to substrate variables were mixed. Cobble substrate showed a small but statistically significant positive association with presence in the mixed-effects model, whereas boulder substrate did not exhibit a clearly distinguishable effect after accounting for other habitat and spatial factors. Similarly, aquatic vegetation did not show a strong or consistent relationship with fish occurrence once other covariates were included.

Taken together, these results indicate that fish distribution in the study system is governed by a combination of fine-scale habitat structure and broader spatial context. While structural habitat features exert the strongest direct influence on occurrence, substantial site-level variation persists, underscoring the importance of reach-scale processes that are not fully captured by microhabitat measurements alone.